Plotting TFR Estimates

Introduction

In this short workbook, we outline how using R we can visualise and compare the various computations of the Palau TFR overtime. The data source is the World Population Prospects, United Nations (2022). What we aim to do is compare the different estimates with the aggregate WPP estimates.

Code

Below is the code used to build the scatter plot comparison.

Loading the Packages

First we load the tidyverse() library of R packages (Wickham et al. 2019), which includes the ggplot2() package (Wickham 2016), the dplyr() package (Wickham et al. 2022), and the plotly() package (Sievert 2020). We also use the ggthemes() package for cool styling later (Arnold 2021) and the gt() package for outputting html tables (Iannone et al. 2023).

library(tidyverse)
library(ggthemes)
library(plotly)
library(gt)

Loading the data

Next we load our two .csv data into R as two tibble objects. One, “e_tfr” is a compilation of all the different estimates for TFR by data source and method. The tibble has four columns: “data_souce, estimate_method, estimated_tfr, estimated_year”. The other, “wpp_tfr” is the WPP approximations for TFR 1960-2020 United Nations (2022)

# load the estimated tfr
e_tfr = read_csv("data/tfr_estimates.csv") %>%
  as_tibble() 

# load the wpp tfr 
wpp_tfr = e_tfr %>% filter(data_source == "WPP")

Source Table

# make a table of unique combos of Source + method
unique_combinations = unique(e_tfr[c("estimate_method", "data_source")])
unique_combinations = unique_combinations[order(unique_combinations$data_source,                                              unique_combinations$estimate_method), ] %>%
    rename(`Estimate Method` = estimate_method, `Data Source` = data_source)

# export as a gt table and set column background colors
unique_combinations %>%
  gt() %>%
  tab_style(style = cell_fill(color = "#1F78B4"),
            locations = cells_column_labels(columns = 1:2))
Estimate Method Data Source
Reverse survival method 1958 Census
Own-children method 1967 Census
Reverse survival method 1967 Census
Reverse survival method 1970 Census
Own-children method 1973 Census
Reverse survival method 1973 Census
Own-children method 1980 Census
Recent births 1980 Census
Reverse survival method 1980 Census
Reverse survival method 1986 Census
Own-children method 1990 Census
Reverse survival method 1990 Census
Cohort-completed fertility backdated by the mean age of childbearing 1995 Census
Own-children method 1995 Census
Reverse survival method 1995 Census
Cohort-completed fertility backdated by the mean age of childbearing 2000 Census
Own-children method 2000 Census
Reverse survival method 2000 Census
Cohort-completed fertility backdated by the mean age of childbearing 2005 Census
Own-children method 2005 Census
Reverse survival method 2005 Census
Own-children method 2012 Mini-Census
Reverse survival method 2012 Mini-Census
Reverse survival method 2015 Census
Recent births 2020 Census
Computed rate from DYB/WPP Register
Computed rate from NSO/WPP Register
Direct Register
Estimate WPP

Static Plots

Now, we use ggplot2() to compute the first static data visualisation. This shows all the different estimated TFR from the ‘e_tfr’ tibble.

# create the scatter plot
est = ggplot(e_tfr) +
  geom_line(aes(x = estimated_year,
                y = estimated_tfr,
                color = interaction(estimate_method, data_source))) +
  labs(x = "Estimated Year",
       y = "Estimated TFR",
       title = "Plot of TFR Estimates of Palau 1950-2020",
       color = "") +
  theme_solarized_2() +
  theme(legend.position = "bottom",
        legend.box = "vertical",
        legend.margin = margin()) +
  guides(color = guide_legend(nrow = 18, byrow = TRUE))

est

The next visualisation shows the estimates with just the WPP approximations

# create the scatter plot
wpp = ggplot(wpp_tfr, aes(x = estimated_year,
                          y = estimated_tfr,
                          color = "black"))+
  geom_line() +
  labs(x = "Estimated Year",
       y = "Estimated TFR",
       title = "WPP TFR Estimates for Palau 1950-2020",
       color = "") +
  theme_solarized_2() +
  theme(legend.position = "none")
wpp

Make interactive

Using the ggplotly function from the plotly package, we can make the plot interactive so it is possible to select which data you want to see and zoom + export a custom png. This is very challenging however and could also probably be achieved in Shiny with more ease (though I would have to host a backend which can be $$$)

# set plotly boundaries
est_plotly = ggplotly(est, width = 800, height = 2000)


# Create a list of buttons
buttons = list()

# Access the levels of the color scale for the correct order

unique_combinations = apply(unique_combinations, 1,
                            function(row) paste(row, collapse = "-"))

# Create a button for each level in the correct order
for (i in seq_along(unique_combinations)) {
  visibility_vals = rep(FALSE, length(unique_combinations))
  visibility_vals[i] = TRUE
  buttons[[length(buttons) + 1]] = list(
    method = "restyle",
    args = list("visible", visibility_vals),
    label = unique_combinations[i]
  )
}

# add a button to show all interactions
buttons[[length(buttons) + 1]] = list(
  method = "restyle",
  args = list("visible", rep(TRUE, length(unique_combinations))),
  label = "Show All"
)

# adjust layout
est_plotly = est_plotly %>% 
  layout(
    showlegend = FALSE,
    updatemenus = list(
      list(
        type = "buttons",
        direction = "down",
        y = -0.1,
        x = 0,
        xanchor = 'left',
        yanchor = 'top',
        buttons = buttons,
        tickfont = list(size = 5),    # adjust button font size
        pad = list(r = 10, t = 10),   # adjust padding
        showactive = TRUE
      )
    ),
    margin = list(
    b = 800  # adjust bottom margin to make space for buttons
    )
  )

# display the plot
est_plotly

Difference WPP and Estimates

Now I want to compute and visualise the difference betweent the WPP line and the different estimates

# drop WPP from data
calc_tfr = e_tfr %>% filter(data_source != "WPP")

# change cols in wpp
wpp_tfr = wpp_tfr %>% select(estimated_year, estimated_tfr) %>%
  rename(estimated_year = estimated_year, wpp_tfr = estimated_tfr)

# calc difference
calc_tfr = calc_tfr %>%
  left_join(wpp_tfr, by = "estimated_year") %>%
  mutate(difference = estimated_tfr - wpp_tfr)

We can visualise this data per year of estimate as a simple point and line of best fit graph

difference_model = lm(difference ~ estimated_year, data = calc_tfr)
r_squared = summary(difference_model)$r.squared

ggplot(calc_tfr, aes(x = estimated_year,
                     y = difference,
                     color = estimate_method,
                     shape = factor(data_source),
                     group = 1)) +
  geom_point() +
  geom_smooth(method = "lm",
              se = FALSE,
              color = "red",
              alpha = 0.3) +
  scale_shape_manual(values = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) +
  labs(title = "Scatter Plot with Line of Best Fit",
       color = "Estimate Method",
       shape = "Data Source") +
  xlab("Year of Estimate") +
  ylab("Difference better Estimated TFR and WPP estimate") +
  geom_text(aes(x = max(estimated_year), y = min(difference), 
                label = paste("R\u00B2 =", round(r_squared, 2))),
            hjust = 1, vjust = -1, color = "dark red") +
  geom_text(aes(x = max(estimated_year), y = min(difference) + 0.1, 
                label = paste("Regression Formula: ",
                              round(coefficients(difference_model)[1], 2),
                              round(coefficients(difference_model)[2], 2),
                               "x")),
            hjust = 1, vjust = -1.5, color = "dark red") +
  theme_solarized_2()
## `geom_smooth()` using formula = 'y ~ x'

Bibliography

Arnold, Jeffrey B. 2021. “Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’.” https://CRAN.R-project.org/package=ggthemes.
Iannone, Richard, Joe Cheng, Barret Schloerke, Ellis Hughes, Alexandra Lauer, and JooYoung Seo. 2023. “Gt: Easily Create Presentation-Ready Display Tables.” https://CRAN.R-project.org/package=gt.
Sievert, Carson. 2020. “Interactive Web-Based Data Visualization with r, Plotly, and Shiny.” https://plotly-r.com.
United Nations. 2022. “World Population Prospects - Population Division - United Nations.” https://population.un.org/wpp/.
Wickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data Analysis.” https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse 4: 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.